MOMENT ANALYSIS OF TERTIARY PROTEIN STRUCTURES 



Field of the Invention 

The present invention relates to protein structure analysis and, more particularly, 
5 to providing a moment analysis of tertiary protein structures. 

Background of the Invention 

Proteins are composed of a series of amino acid residues. There are 20 known 
naturally occurring amino acids. The three-dimensional structure of a protein is typically 

10 composed of a series of folded regions. Current research has focused on protein 
structural determination because three dimensional protein structure is important for all 
human bodily functions. 

Many proteins are globular and form in an aqueous environment. These globular 
proteins are composed of hydrophobic amino acids that avoid water, and hydrophilic 

15 amino acids that are attracted to water. When these proteins fold, the hydrophobic amino 
acids are predominantly arranged in the non-aqueous center of the protein molecule and 
the hydrophilic amino acids are arranged on the aqueous protein surface. A protein 
formed in this manner will have a hydrophobic core and a hydrophilic exterior. In 
addition to this inside-to-outside radial distribution of hydrophobic and hydrophilic amino 

20 acids, there is a gradient of the hydrophobicity of amino acids across the linear extent of 
the protein. This gradient is important since, in many instances, it points to local regions 
that are involved in protein function. Many of these protein functional regions consist of 
a predominance of hydrophilic amino acids. In the binding to lipid bilayers these regions 
may consist of hydrophobic amino acid residues. 

25 The profile of the spatial distribution of hydrophobic and hydrophilic amino acids 

from the protein interior to exterior has been performed previously, in B.D. Silverman, 
Hydrophobic Moments of Protein Structures: Spatially Profiling the Distribution, 98 



YOR920030162US1 



1 



PROC. NATL. ACAD, SCI. 4996-5001 (2001). Previous methods involved the 
determination of a helical hydrophobic moment that provides a measure of the 
amphiphilicty of a segment of a secondary protein structure. See for example, D. 
Eisenberg et al., The Helical Hydrophobic Moment: a Measure of the Amphiphilicity of a 
5 Helix, 299 NATURE 371-74 (1982); D. Eisenberg et al., Analysis of Membrane Protein 
Sequences With the Hydrophobic Moment Plot, 179 J. MOL. BIOL. 125-142 (1984); HJ. 
Pownall et al., Helical Amphipathic Moment: Application to Plasma Lipoproteins, 159 
FEBS 17-23 (1983); I. Tsigelny et al., Mechanism of Action of Chromogranin A on 
Catecholamine Release: Molecular Modeling of the Catestatin Region Reveals a 
10 (3-strand/loop/ p-strand Structure Secured by Hydrophobic Interactions and Predictive of 
Activity, 11 REGULATORY PEPTIDES 43-53 (1998); J.P. Pardo et al., An Alternative 
Model for the Transmembrane Segments of the Yeast H+-ATPase, 15 YEAST 1585-93 

(1999) ; P.W. Mobley, Membrane Interactions of the Synthetic N-terminal Peptide of 
HIV-1 gp41 and its Structural Analogs, 1418 BlOCHIMICA ET BlOPHYSICA ACTA, 1-18 

15 (1999); L. Thong et al, Flexible Programs for the Prediction of Average Amphiphilicity 
of Multiply Aligned Homologous Proteins: Application to Integral Membrane Transport 
Proteins, 16 MOLECULAR MEMBRANE BIOLOGY 173-79 (1999); X. Gallet et al., A Fast 
Method to Predict Protein Interaction Sites from Sequences, 302 J. MOL. BIOL. 917-926 

(2000) ; DA. Phoenix et al., The Hydrophobic Moment and its Use in the Classification of 
20 Amphiphilic Structures (Review), 19 MOLECULAR MEMBRANE BIOLOGY 1-10 (2002). 

While determination of the hydrophobic moments of secondary structures are 
useful, it is desirable to have measurements pertaining to the entire protein structure. 
These measurements would yield information useful in protein structure classification 
and functional region determination. 

25 
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Summary of the Invention 

Techniques for protein structure analysis are provided. In one aspect of the 
present invention, a method for calculating a moment of a tertiary protein structure 
comprising a plurality of residues is provided. A centroid of residue centroids is 
5 calculated. The centroid of residue centroids is used as a spatial origin of a global linear 
hydrophobic moment. The correlation between residue centroid magnitude and residue 
solvent accessibility is enhanced. The global linear hydrophobic moment is defined, 
wherein each of the residue centroids contributes a magnitude and direction to the global 
linear hydrophobic moment. 

10 In another aspect of the present invention, a method for comparing at least two 

tertiary protein structures comprising a plurality of residues is provided. For each tertiary 
protein structure, the method comprises the following steps. A centroid of residue 
centroids is calculated. The centroid of residue centroids is used as a spatial origin of a 
global linear hydrophobic moment. The correlation between residue centroid magnitude 

1 5 and residue solvent accessibility is enhanced. The global linear hydrophobic moment is 
defined, wherein each of the residue centroids contributes a magnitude and direction to 
the global linear hydrophobic moment. The global linear hydrophobic moment 
characterizes an amphiphilicity of each tertiary protein structure. The global linear 
hydrophobic moment of each tertiary protein structures is used to compare the 

20 amphiphilicity of the at least two tertiary protein structures. 

A more complete understanding of the present invention, as well as further 
features and advantages of the present invention, will be obtained by reference to the 
following detailed description and drawings. 

25 
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Brief Description of the Drawing s 

FIG. 1 is a flow chart illustrating an exemplary methodology for calculating a 
moment of a tertiary protein structure comprising a plurality of residues according to the 
teachings of the present invention; 
5 FIG. 2 is a diagram illustrating lever arm dependence of a hydrophobic moment 

according to the teachings of the present invention; 

FIG. 3 is a table containing correlation coefficients of distance and solvent 
accessibility for soluble globular protein databank (PDB) protein structures; 

FIG. 4 is a block diagram of an exemplary hardware implementation of a method 
1 0 for calculating a moment of a tertiary protein structure comprising a plurality of residues 
according to the teachings of the present invention; 

FIG. 5 is a table containing global linear hydrophobic moment magnitudes for 
fifty PDB protein structures according to the teachings of the present invention; 

FIG. 6 is a table containing protein hydrophobicity values according to the 
1 5 Neumaier hydrophobicity scale; 

FIGS. 7A-D are histograms illustrating random distributions of global linear 
hydrophobic moment magnitudes and relationship to four native moments that exhibit 
significant amphiphilicity according to the teachings of the present invention; 

FIG. 8 is a molecular model illustrating direction of a global linear hydrophobic 
20 moment of protein I AUA according to the teachings of the present invention; 

FIG. 9 is a molecular model illustrating direction of a global linear hydrophobic 
moment of protein 1DZV according to the teachings of the present invention; 

FIG. 10 is a table containing enhanced moment-of-geometry ratio values for 
defensin and defensin like protein structures; and 
25 FIGS. 11A-B are molecular models illustrating the hydrophobic moment vectors 

of proteins 1FD3 and 1DFN according to the teachings of the present invention. 
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Detailed Description of Preferred Embodiments 

FIG. 1 is a flow chart illustrating an exemplary methodology for calculating a 
moment of a tertiary protein structure comprising a plurality of residues. In step 102 of 
5 FIG. 1, a centroid of the residue centroids (hereinafter "residue centroids") is calculated. 
The centroid of residue centroids may represent a geometric center of the tertiary protein 
structure. The centroid of a given molecule is determined by setting the mass of each 
atom of the molecule to a value of one. 

The present calculations are based upon the residue locations of the protein. The 
10 center-of-geometry of the ith residue, or residue centroid, rf, is calculated with inclusion 
of only the backbone a - carbon atom and exclusion of the hydrogen atoms. This 
distribution of points in three-dimensional space enables calculation of the geometric 
center, r^, namely, the centroid of the residue centroids: 

15 rt = l?,~n> [1] 

i 

wherein n is the total number of residues. 

Linear hydrophobic imbalance about the average value of protein hydrophobicity, 
h, is given by the following first-order hydrophobic moment: 

20 

*t = iS(*i-*)n, [2] 

i 

wherein h \ , is invariant with respect to the choice of the origin of the moment expansion 
since the subtraction of the mean of the distribution yields a distribution, (hi-h), with 
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vanishing zero-order moment. The origin of the distribution, h\ , that yields the value of 
hi in Equation 2, is the residue centroids, r c . Namely, h = tt Z) enables Equation 2 to 
be written as: 

5 ^ = ±2Ai(7?-"£). [3] 

The first-order hydrophobic imbalance about the mean value of hydrophobicity is 
therefore given by a global linear hydrophobic moment calculated with the centroid of the 
residue centroids as origin. Thus, as shown in step 104 of FIG. 1, the centroid of residue 
10 centroids is used as a spatial origin of the global linear hydrophobic moment. 
Identification of the spatial origin of the global linear hydrophobic moment expansion 
enables explicit registration of the global linear hydrophobic moment with the underlying 
tertiary protein structure. 

An ellipsoidal characterization of protein shape is obtained by defining a second 
1 5 rank geometric tensor as follows: 

G=Z(T|T7r-^r-(7t-^Xn-^)), [4] 

wherein 1, the unit dyadic, is diagonalized to provide the moments-of- geometry, gugi 
20 and g 3 . These moments-of-geometry are the moments-of-inertia of a discrete distribution 
of points of unit mass. The moments-of-geometry are linearly related to the moments 
described in M.H. Hao et al., Effects of Compact Volume and Chain Stiffness on the 
Conformations of Native Proteins, 89 PROC. NATL. ACAD. SCI. 6614-18 (1992), the 
disclosure of which is incorporated by reference herein, obtained by writing the geometric 
25 tensor in a more symmetric form. 
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The aspect ratios of the moments-of-geometry provide an ellipsoidal 
characterization of protein shape: 



5 



10 



g\4+g2yj+g&} = d 1 , [5] 

wherein x p , y Py z p , are coordinates in the frame of the principal axes with the centroid of 
the protein structure as origin. If the magnitudes are ordered as: 

g\ < g2 < gi, [6] 

then the major principal axis is of extent, cP/g\ , wherein each ith residue at location x if>9 
y ip , z ip!> in the principal axis frame, can be considered to reside on an ellipsoid with major 
principal axis equal to dj/g\ 9 namely: 



15 g\x 2 i P +g2y 1 i P +gV% = d 1 i . [7] 

For a compact protein, the residue with the largest d x can specify the ellipsoid 
defining a presumed protein surface. Residues with the same d h namely, residues 
, residing on the same ellipsoid are at the same radial fractional distance from the protein 
20 centroid to the protein ellipsoidal surface. Rewriting Equation 7 as: 

with g' 2 = g 2 /g\ ; g' 3 = gilg\ ; d' 2 = df/g u [9] 
25 enables d\ to be used as the measure of the radial fractional distance of the z'th residue 
from the center of the protein to the protein surface. 
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The correlation between residue centroid magnitude and residue solvent 
accessibility is enhanced, as shown in step 106 of FIG. 1. An exemplary embodiment for 
enhancing the correlation between residue centroid magnitude and residue solvent 
accessibility is described below in conjunction with the description of FIG. 2. Thus, 
5 when defining the global linear hydrophobic moment, each residue centroid contributes a 
magnitude and direction to the global linear hydrophobic moment, as shown in step 108 
of FIG. 1. Further, as will be addressed in conjunction with the description of FIG. 2, 
each residue centroid having the same fractional distance to the surface of the tertiary 
protein structure will contribute an equivalent magnitude to the global linear hydrophobic 

10 moment. An accurate determination of the magnitude of the global linear hydrophobic 
moment is important, as the global linear hydrophobic moment may further be used to 
compare tertiary protein structures, as shown in step 110 of FIG. 1, and as will be 
described in detail below. Therefore, one feature that should be modified in Equation 3 is 
the lever arm dependence of each hydrophobic moment. FIG. 2 is a diagram illustrating 

15 lever arm dependence of a hydrophobic moment. As can be seen in FIG. 2, a residue near 
the exterior of a protein and also near the major principal axis is at a greater distance from 
the center of the protein than a residue near the exterior of the protein but near the minor 
principal axis. For example, distances from the center of the protein to two residues at 
different locations on the same ellipsoid, e.g., residing on the same ellipsoidal surface, are 

20 denoted by arrows 1 and 3 in FIG. 2. Even though the two residues are at the same 
fractional distance to the protein surface, the distance from the origin is different. The 
two residues would therefore make different contributions to the magnitude of the vector, 
~ri , in Equation 3. This difference can be corrected based on a spatial linear moment of 
each residue by mapping the ellipsoidal coordinates onto a sphere with radius equal to the 

25 major principal axis. Both locations are then mapped to the positions designated by 
arrows 2 and 4 in FIG. 2. Since each residue then has an approximately equivalent 
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magnitude, it may be assumed that they contribute an equal magnitude to the global linear 
hydrophobic moment. With this mapping Equation 3 is written as: 

^ = i 2 hi(xii + Jg^yJ + Jg^zjc), [10] 

i 

5 

wherein 1,7, are unit vectors along the directions of the principal axes. Since Equation 
10 is written in the frame of the principal axes, 1\ is at the origin and does not shift with 
the mapping. 

This mapping places all residues on the same ellipsoid at the same distance from 

10 the center of the protein. This distance metric had been described in B.D. Silverman, 
Hydrophobic Moments of Protein Structures: Spatially Profiling the Distribution, 98 
PROC. NATL. ACAD. SCI. 4996-5001 (2001) (hereinafter "Silverman"), the disclosure of 
which is incorporated by reference herein, in regard to calculating the distribution of 
residue hydrophobicity from the protein interior to the protein exterior. The distance 

15 metric correlates more closely with residue solvent accessibility, i.e., the 
solvent-accessible surface area of each residue, than the residue distance from the 
ellipsoidal center prior to the mapping. As such, residue centroid magnitude differences 
which are not representative of residue solvent accessibility may be corrected for. FIG. 3 
is a table containing correlation coefficients of distance and solvent accessibility for 

20 soluble globular protein databank (PDB) protein structures. FIG. 3 further contains the 
scaled moments-of-geometry, g f 2 and g\, for fifty soluble globular PDB protein structures. 
The correlation coefficients . obtained with the distances mapped to a sphere are 
designated "ellipsoidal" while those obtained with the distances from the center of the 
ellipsoid to the residue are designated "radial." Residue solvent accessibility was 

25 obtained from the web site of the Sealy Center for Structural Biology, University of Texas 
Medical Branch, Galveston, TX. Residue solvent accessibility is described, for example, 
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in R. Fraczkiewicz et ah, A New Efficient Algorithm for Calculating Solvent Accessible 
Surface Areas of Macromolecules, ECCC3, Northern Illinois University (Nov. 1996), the 
disclosure of which is incorporated by reference herein. It is important to note that the 
ellipsoidal correlation coefficients are not only greater than the radial correlation 
5 coefficients for every one of the fifty proteins, but that the difference is greatest for the 
proteins exhibiting the greatest deviations from sphericity. 

There are other distance metrics that correlate more closely with residue solvent 
exposure than the ellipsoidal metric, the distance between the residue centroid and 
presumed ellipsoidal protein surface. Such other distance metrics, however, do not 
10 provide a single origin or location for the moment expansion about which hydrophobic 
imbalance can be calculated enabling alignment with the tertiary protein structure. 

Alternatively, a global hydrophobic vector could be constructed utilizing only 
vector magnitudes dependent upon the values of residue solvent exposure, /?,-, and 
hydrophobicity, A/, and wherein the unit vector, is defined as: 

15 

Mi = 0* + )> + + +z ^ m - 0 !) 

With Cartesian coordinates Xt, y h z, written with the centroid of the residue centroids as 
origin, the following hydrophobic vector can be defined: 

20 v 

3 = ilAiP^.. (12) 

i 

The magnitude of the vector to the fth residue is then weighted solely by the values of the 
residue solvent exposure, pu and hydrophobicity, h u a solvent accessibility metric. 
25 Hydrophobic vectors exhibiting significant amphiphilicity, calculated in this manner, will 
qualitatively correspond to the vectors calculated by the spatial linear moment of 
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Equation 10. As shown in FIG. 3, residue solvent exposure and ellipsoidal distance 
correlate fairly closely. However, with such choice of origin for the calculation, Equation 
12 is not a global linear hydrophobic moment and cannot be recast into a form that is a 
linear invariant about the mean value of residue hydrophobicity. 
5 FIG. 4 is a block diagram of an exemplary hardware implementation of a tertiary 

protein structure analyzer 400 in accordance with one embodiment of the present 
invention. It is to be understood that apparatus 400 may implement the methodology 
described above in conjunction with the description of FIG. 1. Apparatus 400 comprises 
a computer system 410 that interacts with media 450. Computer system 410 comprises a 

10 processor 420, a network interface 425, a memory 430, a media interface 435 and an 
optional display 440. Network interface 425 allows computer system 410 to connect to a 
network, while media interface 435 allows computer system 410 to interact with media 
450, such as a Digital Versatile Disk (DVD) or a hard drive. 

As is known in the art, the methods and apparatus discussed herein may be 

15 distributed as an article of manufacture that itself comprises a computer-readable medium 
having computer-readable code means embodied thereon. The computer-readable 
program code means is operable, in conjunction with a computer system such as 
computer system 410, to carry out all or some of the steps to perform the methods or 
create the apparatus discussed herein. The computer-readable code is configured to 

20 calculate a centroid of residue centroids; use the centroid of residue centroids as a spatial 
origin of a global linear hydrophobic moment; enhance correlation between residue 
centroid magnitude and residue solvent accessibility; and define the global linear 
hydrophobic moment, wherein each of the residue centroids contributes a magnitude and 
direction to the global linear hydrophobic moment. The computer-readable medium may 

25 be a recordable medium (e.g., floppy disks, hard drive, optical disks such as a DVD, or 
memory cards) or may be a transmission medium (e.g., a network comprising 
fiber-optics, the world-wide web, cables, or a wireless channel using time-division 
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multiple access, code-division multiple access, or other radio-frequency channel). Any 
medium known or developed that can store information suitable for use with a computer 
system may be used. The computer-readable code means is any mechanism for allowing 
a computer to read instructions and data, such as magnetic variations on a magnetic 
5 medium or height variations on the surface of a compact disk. 

Memory 430 configures the processor 420 to implement the methods, steps, and 
functions disclosed herein. The memory 430 could be distributed or local and the 
processor 420 could be distributed or singular. The memory 430 could be implemented 
as an electrical, magnetic or optical memory, or any combination of these or other types 

10 of storage devices. Moreover, the term "memory" should be construed broadly enough to 
encompass any information able to be read from or written to an address in the 
addressable space accessed by processor 420. With this definition, information on a 
network, accessible through network interface 425, is still within memory 430 because 
the processor 420 can retrieve the information from the network. It should be noted that 

15 each distributed processor that makes up processor 420 generally contains its own 
addressable memory space. It should also be noted that some or all of computer system 
410 can be incorporated into an application-specific or general-use integrated circuit. 

Optional video display 440 is any type of video display suitable for interacting 
with a human user of apparatus 400. Generally, video display 440 is a computer monitor 

20 or other similar video display. 

As was described above in conjunction with the description of FIG. 1, the global 
linear hydrophobic moment may be used to compare protein structures. The global linear 
hydrophobic moment is analogous to the dipole moment for the entire tertiary protein 
structure. Defining a global linear hydrophobic moment would yield a dual measure 

25 comprised of the magnitude and direction of protein amphiphilicity. Thus, the global 
linear hydrophobic moment characterizes the amphiphilicity of the protein. With such a 
measure, a simple comparison of the hydrophobic imbalance, or amphiphilicity, of 
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different protein structures could be made. For example, two structures with the same 
fold and close in root mean square deviation (RMSD) might exhibit very different 
degrees of overall hydrophobic organization. Such differences would be concisely 
summarized by the global linear hydrophobic moment. The direction of such moment 
5 may also assist in identifying regions of functional interest. Further, in regards to global 
structural representations of proteins, such as RMSD and radius of gyration, the global 
linear hydrophobic moment may be useful in the comparison and classification of overall 
protein hydrophobic organization. 

The magnitudes and directions of the global linear hydrophobic moments of 

10 interacting proteins can also provide a measure of the hydrophobic imbalance arising 
from protein to protein interactions. For example, the global linear hydrophobic moments 
of spatially adjacent protein domains have been shown to provide a quantitative measure 
of the degree of prevalence of hydrophobic residues in the region of protein domain 
contact. See R. Zhou et al., Hydrophobicity of Protein Domains: Spatially Profiling 

15 Their Distribution, DISCRETE MATHEMATICS & THEORETICAL COMPUTER SCIENCE 
(DIMACS) WORKSHOP, (Feb. 27-28, 2003); R. Zhou et al., Spatial Profiling of Protein 
Hydrophobicity: Native vs. Decoy Structures, RESEARCH IN COMPUTATIONAL 
MOLECULAR BIOLOGY (RECOMB) (Berlin 2003), the disclosures of which are 
incorporated by reference herein. Molecular moments, such as global linear hydrophobic 

20 moments, may be used to characterize an interesting feature of protein-RNA interactions. 
The ease and ability to rapidly classify lower order angular arrangements of protein 
hydrophobicity is useful in connection with generating three-dimensional protein 
structures. 

FIG. 5 is a table containing global linear hydrophobic moment magnitudes for 
25 fifty PDB protein structures. The global linear hydrophobic moment magnitudes in FIG. 
5 were obtained using Equation 10 for each of the fifty protein structures. The values 
shown have been multiplied by a factor often. To provide a measure for comparison of 
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the linear hydrophobic imbalance, 1,000 calculations have been performed for each 
protein structure with the amino acid distribution randomized. The average global linear 
hydrophobic moment obtained for the 1,000 runs is given in FIG. 5, designated as 
"random magnitude," together with the number of runs that yielded a magnitude of the 
5 moment that was greater than the magnitude of the moment calculated for the native 
amino acid distribution, designated as "number greater." 

Also provided in FIG. 5, is the mean value of hydrophobicity of each protein 
based on the Neumaier hydrophobicity scale, the scale used in the calculations. The 
Neumaier hydrophobicity scale will be described below in conjunction with the 
10 description of FIG. 6. The signs of the amino acid hydrophobicity values have been 
reversed for consistency with the calculations described in Silverman. The amino acid 
hydrophobicity values provide a relative measure of the overall hydrophilicity of the 
different proteins. 

It may be noted from FIG. 5, that most of the values of the moment magnitudes 
15 fall in a range of values that are either less than or not significantly different from a range 
of values expected for a random distribution of residues. The protein with the greatest 
magnitude of the global linear hydrophobic moment is 1AUA. The value of the 
magnitude of this global linear hydrophobic moment is 17.09. One thousand runs with 
randomization of the amino acid distribution resulted in only six runs with a global linear 
20 hydrophobic moment of greater magnitude. 

FIG. 6 is a table containing protein hydrophobicity values according to the 
Neumaier hydrophobicity scale. The Neumaier hydrophobicity scale shown in FIG. 6 has 
been obtained by a principal component analysis of 47 published scales. FIGS. 7A-D are 
histograms illustrating random distributions of global linear hydrophobic moment 
25 magnitudes and relationship to four native moments that exhibit significant 
amphiphilicity. 
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The protein 1AUA, the yeast phosphatidyinositol-transfer protein exhibits 
catalytic activity between membrane layers. The carboxy-terminal forms the hydrophobic 
pocket of the phospholipid-binding domain. Six /?- strands constitute the pocket floor. 
FIG. 8 is a molecular model illustrating direction of a global linear hydrophobic moment 
5 of protein 1AUA. The large global linear hydrophobic moment points in the direction of 
this pocket, as shown in FIG. 8, and away from the helices A2, A3 and A4. The moment 
vector is platted with respect to the original PDB coordinates. 

The protein 1DZV, as well as the family of L-Fuculose-1 -Phosphate Aldolase 
mutants, exhibit an enhanced degree of hydrophobic imbalance.. These proteins are 

10 believed to belong to a superfamily of aldolases that catalyze carbon bond cleavage. FIG. 
9 is a molecular model illustrating direction of a global linear hydrophobic moment of 
protein 1DZV. The calculated global linear hydrophobic moment points away from the 
amino end of the protein and the active site, which includes the zinc atom and the key 
catalytic acid/base residue, GLU 73, as shown in FIG. 9. There is an imbalance in residue 

15 hydrophobicity along the linear amino acid sequence of the protein. The first 85 residues 
from the amino end have an average value of hydrophobicity equal to -0.142, whereas, 
the remainder of the residues has an average value equal to 0.028. Consequently, the 
separated spatial locations of the residues at either of the ends of the protein contribute 
significantly to the orientation of the moment vector as well as to its amplified 

20 magnitude. 

Another protein with enhanced magnitude hydrophobic moment is 2ACT, or 
actinidin. Actinidin is in the papain family, as is 1YAL, for example shown adjacent to 
actinidin in FIG. 5. Both proteins have 49 percent residue identity and a combinatorial 
expansion (CE) RMSD of 1.2 angstroms. The magnitudes of the global linear 
25 hydrophobic moments of 2ACT and 1YAL are, however, different. Whereas 2ACT 
exhibits an enhanced value of the global linear hydrophobic moment, 1 YAL has a global 
linear hydrophobic moment with a magnitude within the range of values obtained by 
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randomizing the residue sequence. This difference highlights the independence of overall 
hydrophobic spatial organization with respect to protein structure. 

The proteins, 1AKZ and 1UDH, are another example of two proteins with high 
sequence similarity and a CE RMSD of 1 .4 angstroms that exhibit recognizably different 
5 residue hydrophobicity spatial arrangements. However, as expected, two proteins from 
different species in the same structural classification of proteins (SCOP) family with 
nearly 100 percent sequence identity and with a RMSD of 0.7 angstroms, for example, 
1BN1 and the A chain of 1G6V, have magnitudes of the global linear hydrophobic 
moment that differ by eight percent. 

10 Defensins are small antimicrobial proteins that act through the permeabilization of 

bacterial membranes. Since defensins attack the bacterial cell wall by residues that 
exhibit cationic and hydrophilic character, the spatial arrangement of residue 
hydrophobicity is of interest. FIG. 10 is a table containing enhanced 
moment-of-geometry ratio values for defensin and defensin like protein structures. FIG. 

15 10 includes the neurotoxin, 1SH1, and two cardiac stimulants, 1AHL (Anthopleurin-B) 
and 1APF (Anthopleurin-A). Interestingly, of all eight structures, the neurotoxin and 
cardiac stimulants exhibit moment magnitudes that are significantly greater than the 
major fraction of the magnitudes randomly generated. The defensins, 1FD3 and 1DFN, 
are dimeric in a biologically active forms. While the CE aligned regions of 1FD3 and 

20 1DFN, that do not include the a-helix of 1FD3, exhibit very different hydrophobic 
organization, the overall dimeric structures exhibit a correspondence in magnitude and 
direction of global linear hydrophobic moments. 

FIG. 10 shows that whereas the global linear hydrophobic moment magnitude of 
1 AHL is comparable to the global linear hydrophobic moment magnitudes of 1B8W and 

25 1BNB it exhibits a greater degree of amphipathicity than either 1B8W and 1BNB, relative 
to its global linear hydrophobic moment magnitude obtained by randomization of the 
amino acid location along the sequence. On average, protein structures will exhibit 
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enhanced values of the average global linear hydrophobic moment when the protein shape 
deviates significantly from a sphere (sphericity), e.g., for structures such as 1B8W and 
1BNB. Significant deviations from sphericity can be identified by enhanced values of the 
moment-of-geometry ratios, g 2 and g' 3 , provided in FIG. 10. For these structures, a 
5 greater percentage of residues reside at locations that are mapped to greater distances 
when mapping to a sphere. This greater percentage emphasizes that, particularly for 
small structures, the significance of the magnitude of the global linear hydrophobic 
moment should be evaluated relative to the average obtained from the randomization of 
amino acid location along the sequence. Consequently, 1AHL is considered to be more 
10 amphipathic than either 1B8W or 1BNB. Such correlation between the average global 
linear hydrophobic moment and deviation from sphericity is also generally noted from the 
entries of the tables in FIG. 3 and FIG. 4, above. These differences are, however, not as 
great as is shown in FIG. 10 since the deviations from sphericity are less for these larger 
structures. 

15 FIGS, 11A-B are molecular models illustrating the hydrophobic moment vectors 

of proteins 1FD3 and 1DFN. In FIGS. 11A-B, the molecular models of proteins 1FD3 
and 1DFN, respectively, are superimposed upon the corresponding tertiary protein 
structure. Both vectors point in the direction of hydrophobic patches. For 1FD3, the 
vector points towards the center of the flat hydrophobic patches of the monomers. For 

20 1DFN, the vectors point in the direction of the apolar base of the basket shaped dimer. 
The location of segregated patches of hydrophobic residues is important regarding issues 
involving the mechanism of defensin antimicrobial binding and activity. 

Although illustrative embodiments of the present invention have been described 
herein, it is to be understood that the invention is not limited to those precise 

25 embodiments, and that various other changes and modifications may be made by one 
skilled in the art without departing from the scope or spirit of the invention. 



YOR920030162US1 



17 



